---
title: "Levene's test for homogeneity"
author: "A. Jonathan R. Godfrey"
date: "last updated: 14 September 2016"
bibliography: briefs.bib
output: html_document
---

```{r setup, purl=FALSE, include=FALSE}
library(knitr)
opts_chunk$set(dev=c("png"), fig.align="center")
opts_chunk$set(comment="", fig.path="./Figures/MultipleComparisons")
```

## The procedure

Levene's test procedure is to perform a one-way ANOVA on the constructed response variable

$$
z_{ij} = \left| y_{ij} - \bar{y}_{i\cdot} \right| $$


If the *F* statistic is significant, the null hypothesis of having homogeneity of variance is rejected.



### Lots of tests exist

Levene's test is found in many statistical software applications. Many other tests have been created and no one test is superior to all others under all conditions. The choice of which test to use is therefore a matter of which one works sufficiently well for the range of circumstances to which it will be applied. @MillikenJohnsonV1E2  review a number of tests and make recommendations based on simulation studies taken from the literature. They note that Levene's test is better than Hartley's or Bartlett's test when the data are not normally distributed. However in the second edition of their first volume, they add consideration of two more tests which perform better than Levene's test under certain conditions. O'Brien's test works better when the underlying distributions are skewed, while the Brown-Forsyth test is better when the tails of the distribution are heavier.


The Hartley test is linked to the rule of thumb about the ratio of maximum to minimum group variances being less than 4. In fact the test statistic tabulated for Hartley's test  is not always around or above 4, especially when the within group sizes are small. This test is not commonly found in software.

Bartlett's test is found in many statistical software applications, but requires the assumption of normal populations which is often not certain.


OBrien's test follows the same procedure as Levene's test in that a substitute value is found for all $y_{ij}$ and an analysis of variance is performed on them. The formula for the transformation is much more complicated than Levene's test and includes selection of a weight parameter.

The Brown-Forsyth test is a modification of Levene's test, and is described below.


### Modification of Levene's test when data appear to be from a skewed distribution


If the data look like they are from a skewed distribution, @BrownForsyth1974 suggest that the *i*^th^ group mean $\bar{y}_{i\cdot}$ should be replaced by the *i*^th^ group median $\tilde{y}_{i\cdot}$ instead. This procedure therefore uses the response variable

$$z_{ij} = \left| y_{ij} - \tilde{y}_{i\cdot} \right| $$


## When the Levene's test for homogeneity is rejected

The most well-known option for dealing with heterogeneous variance in a completely randomized design (CRD) is to transform the response variable. There are situations where this does not yield a suitable response variable, but the *F* test procedure is quite robust  to differences in the within-group variances, especially if the groups sizes are near-equal, or the larger within-group variances are those groups that have larger size.

@MillikenJohnsonV1E2[p38] suggested that the the simple analysis for a CRD is fine unless the Levene Test is rejected at 1%. If it is rejected, then:

### If sample sizes are equal, use the method proposed by @Box1954 

Note: @MillikenJohnsonV1 suggested the @Box1954 procedure, but they drop it in the secodn edition [@MillikenJohnsonV1E2]. 

This method uses the standard ANOVA approach but changes the df for the critical *F* statistic.  Given we have *t* treatment groups and each has size *n* and an observed variance $\sigma_i^2$, we calculate 
$$\bar{\sigma}^2 = \left( \sum_i {\sigma_i^2} \right)/t $$


$$c^2 = \left( \sum_i {(\sigma_i^2 -\bar{\sigma}^2)^2} \right)/t( \bar{\sigma}^2)^2 $$

and then find numerator df as

$$\nu_1 = \frac{t-1}{1+c^2\left(\frac{t-2}{t-1}\right)} $$

and denominator df as

$$\nu_2 = \frac{t(n-1)}{1+c^2} $$


In a perfect world, where the $\sigma_i^2$ values are all equal, the above formulae reduces to $c^2=0$, $\nu_1=t-1$, and $\nu_2=t(n-1)$ as is the case for the standard ANOVA.  The most extreme case would have $c^2=t-1$ which would lead to $\nu_1=1$ and $\nu_2 = n-1$. An ultra conservative test would use the critical value for $F_{\alpha,1,n-1}$

### If sample sizes are not equal, use the method proposed by @Welch1951

Define $W_i = n_i/\hat{\sigma}_i^2$ and 
$$\bar{Y}^* = \frac{\sum_i {W_i \bar{Y}_{i\cdot}}} {\sum_i {W_i}} $$

Let 
$$\Lambda = \sum_i {
\frac{ (1-W_i/W_\cdot)^2}{n_i -1}} $$

where $W_\cdot=\sum_i{W_i}$

The test statistic is then

$$F = \frac{\sum_i{ W_i \frac{(\bar{Y}_{i\cdot} - \bar{Y}^*)^2}{t-1}}}{
1+2(t-2)\Lambda/(t^2-1)} $$

and is compared to the critical value from the *F* distribution with $\nu_1=t-1$ and $\nu_2 = (t^2-1)/3\Lambda$.



@MillikenJohnsonV1E2 provide a second approach for the adjusted test procedure but it is more difficult than the Welch procedure given here.

## References